Optimizing Classi er Performance Via the Wilcoxon-Mann-Whitney Statistic

نویسندگان

  • Lian Yan
  • Robert Dodier
  • Michael C. Mozer
  • Richard Wolniewicz
چکیده

Cross entropy and mean squared error are typical cost functions used to optimize classi er performance. The goal of the optimization is usually to achieve the best correct classi cation rate. However, for many two-class real-world problems, the ROC curve is a more meaningful performance measure. We demonstrate that minimizing cross entropy or mean squared error does not necessarily maximize the area under the ROC curve (AUC). We then consider alternative objective functions for training a classi er to maximize the AUC directly. We propose an objective function that is an approximation to the Wilcoxon-Mann-Whitney statistic, which is equivalent to AUC. The proposed objective function is di erentiable, so gradient-based methods can be used to train the classi er. After discussing the improved results of the new objective function over several UCI data sets, we apply the new objective function to real-world customer behavior prediction problems for a wireless service provider and a cable service provider, and achieve reliable and signi cant improvements in the ROC curve.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic

When the goal is to achieve the best correct classification rate, cross entropy and mean squared error are typical cost functions used to optimize classifier performance. However, for many real-world classification problems, the ROC curve is a more meaningful performance measure. We demonstrate that minimizing cross entropy or mean squared error does not necessarily maximize the area under the ...

متن کامل

A data-adaptive methodology for finding an optimal weighted generalized Mann-Whitney-Wilcoxon statistic

Xie and Priebe [2002. “Generalizing the Mann–Whitney–Wilcoxon Statistic”. J. Nonparametric Statist. 12, 661–682] introduced the class of weighted generalized Mann–Whitney–Wilcoxon (WGMWW) statistics which contained as special cases the classical Mann–Whitney test statistic and many other nonparametric distribution-free test statistics commonly used for the two-sample testing problem. The two-sa...

متن کامل

An optimal Wilcoxon-Mann-Whitney test of mortality and a continuous outcome.

We consider a two-group randomized clinical trial, where mortality affects the assessment of a follow-up continuous outcome. Using the worst-rank composite endpoint, we develop a weighted Wilcoxon-Mann-Whitney test statistic to analyze the data. We determine the optimal weights for the Wilcoxon-Mann-Whitney test statistic that maximize its power. We derive a formula for its power and demonstrat...

متن کامل

Statistical Analysis of Hippocampus Shape Using a Modified Mann-Whitney-Wilcoxon Test

The Mann-Whitney-Wilcoxon (MWW) test statistic, while distribution-free, suffers from a loss of efficacy for certain underlying distributions. In this manuscript, we instead use a data-adaptive weighted generalized Mann-Whitney-Wilcoxon (AWGMWW) test statistic, one that is optimal in the Pitman Asymptotic Efficacy (PAE) sense, to discern differences in hippocampus shape among twin populations w...

متن کامل

A random-sum Wilcoxon statistic and its application to analysis of ROC and LROC data.

The Wilcoxon-Mann-Whitney statistic is commonly used for a distribution-free comparison of two groups. One requirement for its use is that the sample sizes of the two groups are fixed. This is violated in some of the applications such as medical imaging studies and diagnostic marker studies; in the former, the violation occurs since the number of correctly localized abnormal images is random, w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002